Date Functionality
In today's lecture, we explored time series and date functionality in pandas. Manipulating dates and times in pandas is highly flexible, enabling us to conduct advanced analysis such as time series analysis.
Timestamp
Pandas has four main time-related classes: Timestamp, DatetimeIndex, Period, and PeriodIndex.
Creating Timestamps
A Timestamp represents a single point in time. It can be created using a string or by passing multiple parameters.
# Creating a Timestamp from a string
pd.Timestamp('9/1/2019 10:05AM')
# Creating a Timestamp by passing multiple parameters
pd.Timestamp(2019, 12, 20, 0, 0)
Timestamp Attributes
Timestamps have several useful attributes.
# Getting the weekday of a Timestamp (1=Monday, 7=Sunday)
pd.Timestamp(2019, 12, 20, 0, 0).isoweekday()
# Extracting the second from a Timestamp
pd.Timestamp(2019, 12, 20, 5, 2, 23).second
Period
The Period class represents a span of time rather than a specific point in time.
Creating Periods
# Creating a Period representing January 2016
pd.Period('1/2016')
# Creating a Period representing March 5, 2016
pd.Period('3/5/2016')
Arithmetic with Periods
Arithmetic operations on periods are straightforward.
# Adding 5 months to January 2016
pd.Period('1/2016') + 5
# Subtracting 2 days from March 5, 2016
pd.Period('3/5/2016') - 2
DatetimeIndex and PeriodIndex
Creating DatetimeIndex
A DatetimeIndex is the index of a series of Timestamps.
t1 = pd.Series(list('abc'), [pd.Timestamp('2016-09-01'), pd.Timestamp('2016-09-02'), pd.Timestamp('2016-09-03')])
print(t1)
print(type(t1.index)) # DatetimeIndex
Creating PeriodIndex
A PeriodIndex is the index of a series of Periods.
t2 = pd.Series(list('def'), [pd.Period('2016-09'), pd.Period('2016-10'), pd.Period('2016-11')])
print(t2)
print(type(t2.index)) # PeriodIndex
Converting to Datetime
You can convert a list of date strings to Datetime format.
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']
ts3 = pd.DataFrame(np.random.randint(10, 100, (4, 2)), index=d1, columns=list('ab'))
ts3.index = pd.to_datetime(ts3.index)
print(ts3)
# Parsing dates in European format
pd.to_datetime('4.7.12', dayfirst=True)
Timedelta
A Timedelta represents a difference between two dates or times.
# Calculating the difference between two dates
pd.Timestamp('9/3/2016') - pd.Timestamp('9/1/2016')
# Adding a Timedelta to a Timestamp
pd.Timestamp('9/2/2016 8:10AM') + pd.Timedelta('12D 3H')
Offset
An Offset represents calendar-based duration.
# Adding a week to a Timestamp
pd.Timestamp('9/4/2016') + pd.offsets.Week()
# Adding the end of the month to a Timestamp
pd.Timestamp('9/4/2016') + pd.offsets.MonthEnd()
Working with Dates in a DataFrame
Creating a DatetimeIndex with date_range
Using date_range, you can create a DatetimeIndex with specified start or end dates, number of periods, and frequency.
dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')
print(dates)
# Creating a DataFrame with the DatetimeIndex
df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
print(df)
Checking Day of the Week
df.index.weekday
Calculating Differences
df.diff()
Resampling Data
Resampling allows aggregation of data into different frequencies.
df.resample('M').mean()
Datetime Indexing and Slicing
You can use partial string indexing to filter data.
# Filtering by year
df['2017']
# Filtering by month
df['2016-12']
# Filtering by a range of dates
df['2016-12':]
df['2016']